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[57] ABSTRACT 

A multimodal natural language interface interprets user 
requests combining natural language input from the user 
with information selected from a current application and 
sends the request in the proper form to an appropriate 
auxiliary application for processing. The multimodal natural 
language interface enables users to combine natural lan- 
guage (spoken, typed or handwritten) input selected by any 
standard means from an application the user is running (the 
current application) to perform a task in another application 
(the auxiliary application) without either leaving the current 
application, opening new windows, etc or determining in 
advance of running the current application what actions are 
to be done in the auxiliary application. The multimodal 
natural language interface carries out the following func- 
tions: (1) parsing of the combined multimodal input; (2) 
semantic interpretation (Le., determination of the request 
implicit in the pars); (3) dialog providing feedback to the 
user indicating the systems understanding of the input and 
interacting with the user to clarify the request (e.g.. missing 
information and ambiguities); (4) determination of which 
application should process the request and application pro- 
gram interface (API) code generation; and (5) presentation 
of a response as may be applicable. Functions (1) to (3) are 
carried out by the natural language processor, function (4) is 
carried out by the application manager, and function (5) is 
carried out by the response generator. 

8 Claims, 7 Drawing Sheets 
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MULTIMODAL NATURAL LANGUAGE 
INTERFACE FOR CROSS- APPLICATION 
TASKS 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention generally relates to user interfaces 
for computer systems and. more particularly, to a multimo- 
dal natural language interface that allows users of computer 
systems conversational and intuitive access to multiple 
applications. The term **multimodal w refers to combining in 
put from various modalities; e.g., combining spoken, typed 
or handwritten input from the user. 

2. Description of the Prior Art 

Since the introduction of the personal computer, it has 
been a goal to make using such a computer easier. This goal 
recognizes that greater numbers of people are using com- 
puters in their daily lives and business and that the majority 
of the people using computers have little training in their 
use. The term "user friendly- was coined to describe appli- 
cations running on computers which required minimal train- 
ing for a user to be able to effectively use those applications 
and become productive. In a business context, training 
employees in the use of a computer can be a very expensive 
overhead cost to the business. 

The graphic user interface (GUI) was introduced by the 
Xerox Palo Alto Research Center (PARC) and made popular 
by the Apply Macintosh computers. The GUI is often 
described as a "point-and-click" interface because a cursor 
pointing device, such as a mouse, trackball or the like, is 
used to move a cursor on the display to an icon or command 
bar where the user simply "clicks" or. in some cases, double 
"clicks" a mouse button, for example. This is in contrast to 
typing in carefully composed commands, a process which is 
anything but intuitive. The GUI is now the de facto standard 
in such operating systems and International Business 
Machines (IBM) Corporation's OS/2 operating system and 
the forthcoming Mircosoft Windows 95 operating system. 

While the GUI has been a major improvement in com- 
puter interfaces, the effective use of applications running 
under operating systems supporting a GUI still requires a 
knowledge of procedures to effectively use applications 
running on those operating systems. For example, users 
running an application (current application) frequently want 
to perform some unanticipated task in another application 
(auxiliary application) based in part on information in the 
current application. Currently, performing such tasks is 
time-consuming and cumbersome, requiring the user to 
determine what auxiliary application needs to be accessed, 
open a new window, import information from the current 
application, and other related tasks. Thus, as important as the 
GUI has been in making computer systems "user friendly", 
there still remains much improvement to be made to facili- 
tate use of computers by an increasingly large number 
people. 

SUMMARY OF THE INVENTION 

It is therefore an object of the present invention to provide 
a multimodal natural language interface that interprets 
requests combining natural language input from the user 
with information selected from the current application and 
sends the request in the proper form to the appropriate 
application for processing. 

According to the invention, there is provided a multimo- 
dal natural language interface that enables users to combine 
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natural language (spoken, typed or handwritten) input 
selected by any standard means from an application the user 
is running (the current application) to perform a task in 
another application (the auxiliary application) without either 
5 leaving the current application, opening new windows, etc.. 
or determining in advance of running the current application 
what actions are to be done in the auxiliary application. 

The invention carries out the following functions: (1) 
parsing of the combined multimodal input; (2) semantic 
10 interpretation (i.e.. determination of the request implicit in 
the parse); (3) dialog providing feedback to the user indi- 
cating the systems understanding of the input and interacting 
with the user to clarify the request (e.g., missing information 
and ambiguities); (4) determination of which application 
15 should process the request and application program interface 
(API) code generation; and (5) presentation of a response as 
may be applicable. Functions (1) to (3) are carried out by the 
natural language processor, function (4) is carried out by the 
application manager, and function (5) is carried out by the 
20 response generator. 

The invention allows the use of multimodal (spoken, 
typed, handwritten) natural language input supplied by the 
user combined with information selected from a current 
application via any standard technique. The invention fur- 
25 ther provides a unique combination and application of 
techniques from artificial intelligence and computational 
linguistics that have been used in other applications, e.g.. 
natural language database query and machine translation, in 
the area of user interfaces supporting cross-application 
30 tasks. Together, these go beyond current state-of-the-art user 
interfaces supporting cross-application tasks. 

BRIEF DESCRIPTION OF THE DRAWINGS 

35 The foregoing and other objects, aspects and advantages 
will be better understood from the following detailed 
description of a preferred embodiment of the invention with 
reference to the drawings, in which: 

FIG. 1 is a block diagram showing a hardware configu- 
40 ration on which the subject invention may be implemented; 
FIG. 2 is a block diagram of the multimodal system 
architecture according to the present invention; 

FIG. 3 is a block diagram of a first example of the 
operation of the multimodal system shown in FIG. 2; 

FIG. 4 is a block di* &*m of a second example of the 
operation of the multimodal system shown in FIG. 2; 

FIG. 5 is a flow dia gr am showing the logic of the 
combining multimodal linguistic input function of the dis- 
30 patcher; 

FIG. 5A is an example of the combining multimodal 
linguistic input function of the dispatcher; 

FIG. 6 is a flow diagram showing the logic of the 
application manager, and 
55 FIG. 6A is an example of a concept/application registra- 
tion table used by the application manager. 

DETAILED DESCRIPTION OF A PREFERRED 
EMBODIMENT OF THE INVENTION 

60 

Referring now to the drawings, and more particularly to 
FIG. 1, mere is shown a representative hardware environ- 
ment on which the subject invention may be implemented. 
This hardware environment may be a personal computer. 
65 such as the IBM's PS/2 family of Personal Computers, 
running an operating system capable of supporting 
multitasking, such as IBM's OS/2 operating system. The 
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hardware includes a central processing unit (CPU) 10. which 
may conform to Intel's X86 architecture or may be a reduced 
instruction set computer (RISC) microprocessor such as 
IBM's PowerPC® microprocessor. The CPU 10 is attached 
to a system bus 12 to which are attached a read/write or 5 
random access memory (RAM) 14* a read only memory 
(ROM) 16. an input/output (I/O) adapter 18, and a user 
interface adapter 22. The RAM 14 provides temporary 
storage for application program code and data, while ROM 
16 typically includes the basic input/output system (BIOS) 1( 
code. The I/O adapter 18 is connected to one or more Direct 
Access Storage Devices (DASDs), here represented as a disk 
drive 20. The disk drive 20 typically stores the computer's 
operating system (OS) and various application programs, 
each of which are selectively loaded into RAM 14 via the 1; 
system bus 12. The user interface adapter 22 has attached to 
it a keyboard 24. a mouse 26, a speaker 28, a microphone 32. 
and/or other user interface devices (not shown). The per- 
sonal computer also includes a display 38. here represented 
as a cathode ray tube (CRT) display but which may be a 2 
liquid crystal display (LCD) or other suitable display. The 
display 38 is connected to the system bus 12 via a display 
adapter 34. Optionally, a communications adapter 34 is 
connected to the bus 12 and to a network, for example a local 
area network (LAN), such as IBM's Token Ring LAN. 2 
Alternatively, the communications adapter may be a modem 
connecting the personal computer or workstation to a tele- 
phone line as part of a wide area network (WAN). 

The preferred embodiment of the invention is imple- 
mented on a hardware platform as generally shown in FIG. 3 
1. The architecture of the multimodal natural language 
interface according to the invention will now be described 
followed by specific examples of its operation. The multi- 
modal natural language interface is linked to applications 
permitting users, from within a current application, to per- 3 
form actions in an auxiliary application without the neces- 
sity of opening new windows or similar procedures. The 
term t4 rnultimodaT refers to the feature of cornbining input 
from various modalities; e.g.. combining spoken, typed, or 
handwritten input from the user with input selected from an 4 
application the user is running by any standard means, 
including point- and-click. touch, and keyboard selection. 

With reference now to FIG. 2 there is shown the basic 
architecture of the system. The user input may be spoken, 
typed, handwritten, mouse controlled cursor, touch, or any 4 
other modality. In the illustrated example, speech is input via 
microphone 32 (FIG. 1). The speech input, "Find address**, 
is supplied to a speech recognizer 41 which generates an 
output. At the same time, the user may also provide non- 
speech input; e.g.. by keyboard 24, mouse 26, a touch screen 3 
(not shown) attached to display 38. or the like. As mentioned 
the multimodal input contemplates handwritten input as 
well, and this may be accommodated by means of a stylus 
and tablet (not shown) or the mouse 26. This non-speech 
input is received by the screen manager 42, such as the 3 
Presentation Manager (PM) of the OS/2 operating system. 
The screen manager 42 also provides the a display window 
for application A, the current application, here shown as 
being accessed from a direct access storage device (DASD) 
43, such as the hard disk 20 (FIG. 1). Within the window for ( 
application A. there is an 4< Item-in-Focus*\ such as text or a 
graphic. 

The output of the speech recognizer 41 and the non- 
speech input received by the screen manager 42 are sent to 
a dispatcher 44 which combines the inputs and directs the < 
combined input to first of all a natural language processor 
45. The natural language processor 45 directs the combined 
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multimodal input to a parser/semantic interpreter 46 which 
accesses grammars and dictionaries on DASDs 47 and 48. 
which may be the same or different hard disk 20 (FIG. 1) on 
which application A resides. The parsed input is subjected to 
further semantic interpretation by the dialog manager 49. 
again with the aid of the grammars and dictionaries on 
DASDs 47 and 48. The natural language processor 45 
provides feedback to the user via the dispatcher 44 to 
indicate the system* s understanding of the input. If 

3 necessary, the natural language processor 45 interacts with 
the user to clarify any missing information or ambiguities in 
the request. The techniques employed by the natural lan- 
guage processor 45. parser 46 an dialog manager 49 are 
common in the area of natural language query database 

5 systems. Examples of commercially available natural lan- 
guage query database systems are IBM* s "LanguageAccess** 
and NRTs "Natural Language** products. 

Based on the output of the natural language processor 45. 
the dispatcher 44 invokes the application manager 51 to 

o determine which application should process the request 
Note that in the prior art the application manager of the 
operating system would have to be invoked by the user to 
first open a window for a selected application and then the 
application would have to be started and run in that window. 

5 The user would then have to access the requested informa- 
tion and then, using a clipboard function, copy and paste the 
information into the original application window. According 
to the invention, this is all done automatically without any 
intervention by the user. For example, the application man- 

0 ager 51 may access any of applications B to Z on DASDs 52 
to 53, again which may be the same or different hard disk 20 
(FIG. 1) on which application A resides. The application 
accessed is the auxiliary application. The application man- 
ager 51 determines which of applications B to Z has the 

5 requested information. The application manager 51 may 
determine that a database program, say application B. con- 
tains an address file where the requested information resides. 
The application manager 51 sends semantic representation 
of the request to the API code generator for application B 

0 which, in turn, generates the application program interface 
(API) code required to access the requested information. 
This is done without opening a window. The auxiliary 
application (e.g., the database program) is opened in the 
background and the API code (e.g., query) is generated to 

.5 retrieve the requested information. Once the information has 
been accessed by the application manager 51. the requested 
information is supplied to the dispatcher 44 which then 
dispatches the information to the response generator 54. The 
response generator 54 then generates a response appropriate 

0 to the nature of the request and the current application. This 
response can be speech, from a synthesizer (not shown), text 
in a pop up window, text or a graphic which is pasted into 
the current application, a video cup, or the like. 

Consider now a specific example with reference to FIG. 

i5 3. If the current application (application A) is a word 
processor and the user is writing a letter to Joe Smith, after 
typing John Smith's name via keyboard 24. the user may 
provide the speech input, "Find address**. The combined 
multimodal input, the typed name of Joe Smith ("Item-in- 

£ Focus** in FIG. 1) and the spoken request "Find address**, is 
processed by the natural language processor 45 and supplied 
by the dispatcher 44 to the application manager 51, here 
represented by the M Ask-It** block 55. In the example 
described, the combined input is "Find address (of) Joe 

» Smith**. The function performed is to access a names and 
addresses file 56 via a database program on DASD 52 and 
retrieve Joe Smith's address. The appropriate response is to 
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paste the retrieved address of Joe Smith in the letter being table 63. This is logically just a table where, without loss of 

written by the word processor application (application A). generality, the columns are labeled with application names 

Consider next the example shown in FIG. 4. The user has and the rows with concept names. An example is shown in 

typed in Joe Smith's name, but now instead of requesting an FIG. 6A. Once the set of application-specific concepts is 

address, the user provides the speech input "Phone". There $ determined, each such concept is looked up in the concept/ 

are several possible answers illustrated in the example of application registration table, and the associated set of 

FIG. 4. The first is to retrieve Joe Smith' s telephone number. registered application names is returned. Each concept thus 

However, if there are two Joe Smiths in the database, then results in a set of application names being produced, which 

there is an ambiguity that must be clarified before a final t>e referred to as a '^Concept-Application Set". After 

response can be generated. The dialog manager 49 (FIG. 2) 10 eacn concept has been processed, the result is a collection of 

will provide a choice to the user, perhaps in a pop-up Concept-Application Sets, one set of application names for 

window, and request the user to select one of the choices. On cacn application-specific concept looked up in the concept/ 

the other hand, there may be no Joe Smith listed in the application registration table 63. The name of each applica- 

phonebook, in which case there is not enough information in ti 0D mat occur in every Concept-Application Set derived 

the request to process it. The dialog manager 49 would then 15 from mc semantic representation is determined, 

inform the user that mere is no Joe Smith listed and ask for Logically, this can be done by simple set intersection. The 

more information, such as **Should I look elsewhere". This resu i t ^ a set 0 f application names (Application Set), all of 

response could be a text display in a pop up window, for ^ registered with each application-specific concept 

example, or synthesized speech. Ultimately, when the tele- derived from the semantic representation of the input 

phone number is located the «^ D ^^ d J 20 Next, in function block 64, the application manager sends 

listing of the number itself or me number would be dialed ^ ntatiofl t0 the^PI code generator 65 of 

via the communications adapter 34 (FIG. 1). ^ ^ ^ 0IL Typ ically< there will be only one, but 

The functions which support the multimodal natural Ian- nothiDg prudes more than one application name occurring 

guage interface are the dispatcher 44 and the application in ^ Application Set. In such a case, the input is truly 

manager 51 shown in FIG. 1 With reference now to FIG. 5, ^ md mc system could either report this to the user 

the dispatcher function is illustrated by way of a flow ^ me dispatcher or simply submit the semantic represen- 

diagram. The user input, II, and the item-in-focus input, 12, to each of the named application API code generators 

from the current application are simply concatenated in or both. Nothing in the architecture hinges on this choice and 

function block 56 as "user m put>*1tem-m-focus w . The p^eter could be set to determine the actual behavior of 

grammar and semantic interpretation rules used in the natu- 30 ^ sy Stcm ^ particular circumstances. It is also possible that 

ral language processor 45 insure the intended meaning is ^ App^^on Set is empty, corresponding to an input that 

recovered. As mentioned, various state of the art natural was not meaningful with respect to the applications regis- 

language processing systems can be used to perform the ^ witn mc systcm m the concept/appUcation registration 

function of the natural language processor 45. Even if the ^ ^ ^ would ^ reported back to the dispatcher 

concatenated input to the natural language processor 45 does 35 for processing. cg M interaction with the user to 

not match the natural order of the natural language determine the next action, if any. Assuming mat an appU- 

processed. the natural language processor will still recover ca&M fa fouQd ^ ^ semantic representation is sent to that 

the intended meaning. For example, if the concatenated application's AH code generator in function block 65, the 

input were "send to Mary"4<filename>, meaning "send to ^c^on then acts on the code in function block 66 to 

Mary financial data", the natural language processor 45 40 ^ ^ requested. 

would understand this by the correct English expression ^ . . . . , . „ „ - 

**,^a ♦« ur Q «;- NR„ ftfir ^T tat* While the invention has been described in terms of a 

send <mename> to Mary meaning send financial dato to embodiment those skilled in the art will 

Mary" since fte natoal language processor can analyse JJJJ^ ^ mven tion can be practiced with modifi- 

unusu^ word orders by wpp^gthcappr^cfftmm**- S^^t^^it and scope of te appended claims, 

cal rules. A significant ease of use advantage of this system 45 „ . ~~ "7 ^ " 7^ ' ^fi ^ To - „ _ 

is that the usefinput and the input suppliedfrom the current * us Scribed my invention, what I claim as new 

application can be input in ei&er temporal order or even ^ C *J5T J ^ f f° W * t 

^T. . ^ 1 1. A multimodal natural language interface for a computer 

overlap in tune. .... ... , 

m . . A it _ , ^ £ ^ system which interprets user requests combining natural 

FIG. 5Apro«des another examp e of meor^rauonof the > input from the user with information selected from 

dispatcher function 56. In this case, the user input is phone 50 . ^ A1 _ , 

T~r^ 7 . , „, , v vJT *~ . a current application running on the computer system and 

and the application input is John Smith . The dispatcher . Tf. ♦ •» ™ JL^ trt Qn JZ™^*! o,w;i;*«, 

. , . . . ^_ ^ t , . T . _ \". A sends the request in proper form to an appropriate auxiliary 

concatenate* function is to output -phone John Snuth to a ^ fcation 7 or me multimodal natural language 

the natural language processor. Ztafict comprising: 

The flow diagram of the application manager 51 is shown a dispatcher receiving a natural language input from the 
m RG. 6. to wluch reference is now made^or a given input, » S^£mbLu?g the natural lang\iaTinput with 
me application manager r^tods all concepts in the seman- ^Z^S^^^o^^iooto 
be representation provided by the nat4iral lai!guage rroas- combined multimodal request; 
sor 45 in function block 61 and then, in function block 62. ^ , . 
determines from the semantic representation each appuca- 8 P arscr receiving the combined multimodal request for 
tion that is registered with every concept in the semantic 60 P^S *e combined multimodal request; 
representation. This determination is made by referencing a a natural language processor performing semantic inter- 
concept/application table 63. Some concepts might be stipu- pretation of the parsed combined multimodal request 
lated to be application independent, and those would not and generating a semantic representation of the corn- 
need to be considered. Such concepts could be identified by bined rnultimodal request; 

a flag set in a dictionary. Each application-specific concept 65 an application manager receiving the semantic represen- 
ts listed along with the names of the applications registered tation from the natural language processor for deter- 
with that concept in the concept/application registration mining which auxiliary application should process the 
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request, said application manager invoking the auxil- 
iary application and generating application program 
interface (API) code to access requested information 
via the auxiliary application, the accessed requested 
information being supplied to said dispatcher, and 
a response generator receiving the accessed requested 
information from the dispatcher for generating a 
response as may be applicable to the user's request. 

2. The multimodal natural language interface recited in 
claim 1 further comprising a dialog manager providing 
feedback to the user indicating the system's understanding 
of the input and interacting with the user to clarify the 
request, if necessary. 

3. The multimodal natural language interface recited in 
claim 2 wherein said dispatcher forms the combined multi- 
modal request by concatenating the user natural language 
input with the input information selected from the current 
application running on the system, 

. The multimodal natural language interface recited in 



10 



15 



mation selected from a current application to for a 
combined multimodal request; 

parsing the combined multimodal request; 

performing semantic interpretation of the parsed com- 
bined multimodal request to generate a semantic rep- 
resentation of the combined multimodal request; 

deterrnining of which auxiliary application should process 
the request; 

invoking the auxiliary application and generating appli- 
cation program interface (API) code to access 
requested information via the auxiliary application; and 

receiving the accessed requested information and gener- 
ating a response as may be applicable to the user's 
request 

6. The method recited in claim 5 further comprising the 
step of providing feedback to the user indicating the sys- 
tem's understanding of the input and interacting with the 



4. The multimodal natural language interface recited in , ^ * -c * 

combining is performed by concatenating the user natural 



application registration table, said application manager find- 
ing all concepts in the semantic representation from the 
natural language processor and then finding all applications 
registered in said concept/application registration table for 
those concepts. 

5. A method implemented in a computer system for 
interpreting user requests by combining natural language 
input from a user with information selected from a current 
application running on the computer system comprising the 
steps of: 

receiving a natural language input from the user and 
combining the natural language input with input infor- 



25 



30 



language input with the input information selected from the 
current application running on the system. 

8. The method recited in claim 7 further comprising the 
steps of: 

generating a concept/application registration table; 
finding all concepts in the semantic representation; and 
then finding all applications registered in said concept/ 
application registration table for those concepts. 
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